Skip to content

vLLM OpenAI API server supports recording experience data#591

Open
pan-x-c wants to merge 79 commits into
agentscope-ai:mainfrom
pan-x-c:feature/model_self_record_experience
Open

vLLM OpenAI API server supports recording experience data#591
pan-x-c wants to merge 79 commits into
agentscope-ai:mainfrom
pan-x-c:feature/model_self_record_experience

Conversation

@pan-x-c

@pan-x-c pan-x-c commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Description

As the title says

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@pan-x-c pan-x-c marked this pull request as draft June 24, 2026 12:26
pan-x-c and others added 28 commits June 25, 2026 11:15
…th legacy)

Refactor experience production so heavy data (tokens/logprobs/routed_experts)
no longer rides runner->scheduler->coordinator as serialized bytes. The vLLM
recorder now captures it in-process into a MemoryStore keyed by task_id, and
the coordinator pulls it at finalize time via /records/consume_task. Runners
ship only a small reward map. Both paths coexist behind
`explorer.use_recorded_experience` (default off = legacy).

Recording module (trinity/common/models/vllm_patch/recording/):
- store: drop SqlStore; MemoryStore.update_reward_by_task_id stamps
  reward/run/task on a whole task-id group, pops and returns it (the
  in-memory replacement for the SQL HistoryRecorder join).
- recorder: track in-flight record tasks; add flush() (await pending +
  queue.join) so a consume sees a quiesced store; honor skip_recording_ctx.
- models: build_experience emits one Experience per completion (n>1) with
  info["sample_index"]; eid.suffix=request_id kept for traceability.
- context: add skip_recording_ctx; task_id already flows via api_key
  (RecordingIdentityMiddleware) and now also via VLLMModel.chat (Ray entry).
- query: POST /records/consume_task (flush -> update_reward_by_task_id ->
  serialize_many); drop the SqlStore 503 branch.
- config/server: remove RecordingConfig entirely; the logprob width is a
  recorder-internal constant (we store only the chosen token, which vLLM
  force-includes at logprobs=1). No static config threaded through launch.

task_id propagation (Ray entry, same contextvar as the HTTP middleware):
- vllm_model: chat/generate accept task_id_key, set task_id_ctx around
  _generate_internal; logprobs sets skip_recording_ctx (auxiliary forward).
- model: ModelWrapper.chat/chat_async forward task_id_key; SGLang.chat
  accepts-and-ignores it (recording is vLLM-only).

Coordinator + runner + workflow:
- rollout_coordinator: _resolve_rank_urls (ray.get_actor per engine) and a
  recording-mode finalize that fans out /records/consume_task per engine,
  deserializes, and feeds objects to the pipeline (no re-serialization).
- experience_pipeline: process_experiences(exps) public object entry.
- workflow_runner: recording mode returns a pickled reward map keyed by the
  per-sample task_id_key the workflow stamped; legacy path unchanged.
- workflow: SimpleWorkflow/AsyncSimpleWorkflow run a per-sample n=1 loop in
  recording mode (distinct task_id_key per sample == reward unit for GRPO),
  legacy n=repeat_times single-call path unchanged.
- config: ExplorerConfig.use_recorded_experience flag.

SQL path removal (MemoryStore only):
- delete proxy/recorder.py (HistoryRecorder) and proxy_test.py; proxy
  service/app drop /feedback, /commit, record_feedback, submit_experiences,
  ready_experiences (keep allocate_model + weight sync); allocator no longer
  fills record_db_url; drop InferenceModelConfig.record_db_url and the dead
  ExplorerConfig.db_url field; RecordingConfig deleted.

Serve-mode external reward reporting is intentionally left unimplemented this
version (proxy /feedback//commit removed); the affected serve integration
tests (TestServeWithTrainer, ServeTest) are skipped with a pointer to the
recording refactor plan. convert_messages_to_experience redirect (multi-turn)
is deferred with TODOs at its call sites.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…-x-c/Trinity-RFT into feature/model_self_record_experience
@pan-x-c

pan-x-c commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator Author

/unittest-module-common

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

unittest: Run #1798

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Pending ⏳ Other ❓ Flaky 🍂 Duration ⏱️
97 91 5 1 0 0 0 41m 34s

Some tests failed!

Name Failure Message
❌ tests/common/vllm_test.py::TestAPIServerCommon::test_api The test failed in the call phase due to an assertion error
❌ tests/common/vllm_test.py::TestQwen35APIServerMultiModal::test_multi_modal_content The test failed in the call phase
❌ tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async The test failed in the call phase due to an assertion error
❌ tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls The test failed in the call phase due to an assertion error
❌ tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls The test failed in the call phase due to an assertion error

Github Test Reporter by CTRF 💚

@pan-x-c

pan-x-c commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator Author

/unittest-module-common

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

unittest: Run #1799

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Pending ⏳ Other ❓ Flaky 🍂 Duration ⏱️
97 96 0 1 0 0 0 43m 48s

🎉 All tests passed!

Github Test Reporter by CTRF 💚

@pan-x-c

pan-x-c commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator Author

/unittest-module-trainer

@pan-x-c pan-x-c marked this pull request as ready for review July 1, 2026 06:46
Comment on lines +204 to +205
completed_runs=self.__class__.can_repeat and self.repeat_times or 1,
total_runs=self.__class__.can_repeat and self.repeat_times or 1,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May not need __class__

Comment on lines 316 to +318
self.repeat_times = repeat_times
self.task.rollout_args.n = repeat_times
self.run_id_base = run_id_base
self.task.rollout_args.n = repeat_times

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.repeat_times = repeat_times
self.task.rollout_args.n = repeat_times
self.run_id_base = run_id_base
self.task.rollout_args.n = repeat_times
supper().set_repeat_times(repeat_times, run_id_base)
self.task.rollout_args.n = repeat_times

Comment thread trinity/common/config.py
# ``enable_router_replay`` (mirrored to ``enable_return_routed_experts`` in
# ``config_validator``); it is not implied by ``enable_history``, so dense
# models can record history too.
enable_history: bool = False

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may set to True

Comment thread trinity/common/config.py
enable_history: bool = False

# For OpenAI API
enable_openai_api: bool = False

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may set to True

Comment on lines +154 to 155
response_text: str = "" # Text of the response
prompt_text: Optional[str] = None # Text of the prompt

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can response_text and prompt_text be the same default value

@pan-x-c pan-x-c force-pushed the feature/model_self_record_experience branch from c85db9e to 740805a Compare July 1, 2026 13:37
@pan-x-c

pan-x-c commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator Author

/unittest-module-trainer

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

unittest: Run #1803

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Pending ⏳ Other ❓ Flaky 🍂 Duration ⏱️
29 25 0 4 0 0 0 1h 17m

🎉 All tests passed!

Github Test Reporter by CTRF 💚

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants